175 research outputs found

    Optimal decentralized control of coupled subsystems with control sharing

    Full text link
    Subsystems that are coupled due to dynamics and costs arise naturally in various communication applications. In many such applications the control actions are shared between different control stations giving rise to a \emph{control sharing} information structure. Previous studies of control-sharing have concentrated on the linear quadratic Gaussian setup and a solution approach tailored to continuous valued control actions. In this paper a three step solution approach for finite valued control actions is presented. In the first step, a person-by-person approach is used to identify redundant data or a sufficient statistic for local information at each control station. In the second step, the common-information based approach of Nayyar et al.\ (2011) is used to find a sufficient statistic for the common information shared between all control stations and to obtain a dynamic programming decomposition. In the third step, the specifics of the model are used to simplify the sufficient statistic and the dynamic program. As an example, an exact solution of a two-user multiple access broadcast system is presented.Comment: Submitted to IEEE Transactions on Automatic Contro

    A Decision Theoretic Framework for Real-Time Communication

    Full text link
    We consider a communication system in which the outputs of a Markov source are encoded and decoded in \emph{real-time} by a finite memory receiver, and the distortion measure does not tolerate delays. The objective is to choose designs, i.e. real-time encoding, decoding and memory update strategies that minimize a total expected distortion measure. This is a dynamic team problem with non-classical information structure [Witsenhausen:1971]. We use the structural results of [Teneketzis:2004] to develop a sequential decomposition for the finite and infinite horizon problems. Thus, we obtain a systematic methodology for the determination of jointly optimal encoding decoding and memory update strategies for real-time point-to-point communication systems.Comment: 10 pages, 1 figure: Forty Third Allerton Conference of Control Communication and Computin

    Sufficient statistics for linear control strategies in decentralized systems with partial history sharing

    Full text link
    In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and identify finite dimensional sufficient statistics for such systems. Unlike prior work on decentralized LQG systems, we do not assume partially nestedness or quadratic invariance. Our approach is based on the common information approach of Nayyar \emph{et al}, 2013 and exploits the linearity of the system dynamics and control strategies. To illustrate our methodology, we identify sufficient statistics for linear strategies in decentralized systems where controllers communicate over a strongly connected graph with finite delays, and for decentralized systems consisting of coupled subsystems with control sharing or one-sided one step delay sharing information structures

    Decentralized stochastic control

    Full text link
    Decentralized stochastic control refers to the multi-stage optimization of a dynamical system by multiple controllers that have access to different information. Decentralization of information gives rise to new conceptual challenges that require new solution approaches. In this expository paper, we use the notion of an \emph{information-state} to explain the two commonly used solution approaches to decentralized control: the person-by-person approach and the common-information approach

    Opportunistic capacity and error exponent regions for compound channel with feedback

    Full text link
    Variable length communication over a compound channel with feedback is considered. Traditionally, capacity of a compound channel without feedback is defined as the maximum rate that is determined before the start of communication such that communication is reliable. This traditional definition is pessimistic. In the presence of feedback, an opportunistic definition is given. Capacity is defined as the maximum rate that is determined at the end of communication such that communication is reliable. Thus, the transmission rate can adapt to the channel chosen by nature. Under this definition, feedback communication over a compound channel is conceptually similar to multi-terminal communication. Transmission rate is a vector rather than a scalar; channel capacity is a region rather than a scalar; error exponent is a region rather than a scalar. In this paper, variable length communication over a compound channel with feedback is formulated, its opportunistic capacity region is characterized, and lower bounds for its error exponent region are provided.

    Team Optimal Decentralized State Estimation of Linear Stochastic Processes by Agents with Non-Classical Information Structures

    Full text link
    We consider the problem of team optimal decentralized estimation of a linear stochastic process by multiple agents. Each agent receives a noisy observation of the state of the process and delayed observations of its neighbors (according to a pre-specified, strongly connected, communication graph). Based on their observations, all agents generate a sequence of estimates of the state of the process. The objective is to minimize the total expected weighted mean square error between the state and the agents' estimates over a finite horizon. In centralized estimation with weighted mean square error criteria, the optimal estimator does not depend on the weight matrix in the cost function. We show that this is not the case when the information is decentralized. The team optimal decentralized estimates depend on the weight matrix in the cost function. In particular, we show that the optimal estimate consists of two parts: a common estimate which is the conditional mean of the state given the common information and a correction term which is a linear function of the offset of the local information from the conditional expectation of the local information given the common information. The corresponding gain depends on the weight matrix as well as on the covariance between the offset of agents' local information from the conditional mean of the local information given the common information. We show that the local and common estimates can be computed from a single Kalman filter and derive recursive expressions for computing the offset covariances and the estimation gains.Comment: 16 pages, 6 figures, Submitted to Automatica second versio

    Renewal Monte Carlo: Renewal theory based reinforcement learning

    Full text link
    In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients---a likelihood ratio based estimator and a simultaneous perturbation based estimator---and show that for both estimators, RMC converges to a locally optimal policy. We generalize the RMC algorithm to post-decision state models and also present a variant that converges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, event-triggered communication, and inventory management.Comment: 9 pages, 5 figure

    Distortion-transmission trade-off in real-time transmission of Markov sources

    Full text link
    The problem of optimal real-time transmission of a Markov source under constraints on the expected number of transmissions is considered, both for the discounted and long term average cases. This setup is motivated by applications where transmission is sporadic and the cost of switching on the radio and transmitting is significantly more important than the size of the transmitted data packet. For this model, we characterize the distortion-transmission function, i.e., the minimum expected distortion that can be achieved when the expected number of transmissions is less than or equal to a particular value. In particular, we show that the distortion-transmission function is a piecewise linear, convex, and decreasing function. We also give an explicit characterization of each vertex of the piecewise linear function. To prove the results, the optimization problem is cast as a decentralized constrained stochastic control problem. We first consider the Lagrange relaxation of the constrained problem and identify the structure of optimal transmission and estimation strategies. In particular, we show that the optimal transmission is of a threshold type. Using these structural results, we obtain dynamic programs for the Lagrange relaxations. We identify the performance of an arbitrary threshold-type transmission strategy and use the idea of calibration from multi-armed bandits to determine the optimal transmission strategy for the Lagrange relaxation. Finally, we show that the optimal strategy for the constrained setup is a randomized strategy that randomizes between two deterministic strategies that differ only at one state. By evaluating the performance of these strategies, we determine the shape of the distortion-transmission function. These results are illustrated using an example of transmitting a birth-death Markov source

    Optimal Performance of Feedback Control Systems with Limited Communication over Noisy Channels

    Full text link
    A discrete time stochastic feedback control system with a noisy communication channel between the sensor and the controller is considered. The sensor has limited memory. At each time, the sensor transmits encoded symbol over the channel and updates its memory. The controller receives a noisy version of the transmitted symbol, and generates a control action based on all its past observations and actions. This control action action is fed back into the system. At each stage the system incurs an instantaneous cost depending on the state of the plant and the control action. The objective is to choose encoding, memory updating and control strategies to minimize the expected total costs over a finite horizon, or the expected discounted cost over an infinite horizon, or the expected average cost per unit time over an infinite horizon. For each case we obtain a sequential decomposition of the optimization problem. The results are extended to the case when the sensor makes an imperfect observation of the state of the system.Comment: Preprint of paper to appear in CDC 2006. 8 pages, 2 figure

    Sufficient conditions for the value function and optimal strategy to be even and quasi-convex

    Full text link
    Sufficient conditions are identified under which the value function and the optimal strategy of a Markov decision process (MDP) are even and quasi-convex in the state. The key idea behind these conditions is the following. First, sufficient conditions for the value function and optimal strategy to be even are identified. Next, it is shown that if the value function and optimal strategy are even, then one can construct a "folded MDP" defined only on the non-negative values of the state space. Then, the standard sufficient conditions for the value function and optimal strategy to be monotone are "unfolded" to identify sufficient conditions for the value function and the optimal strategy to be quasi-convex. The results are illustrated by using an example of power allocation in remote estimation.Comment: 8 page
    • …
    corecore